Skip to content

Encode modules with variable-length integers#2322

Merged
alexcrichton merged 1 commit into
bytecodealliance:mainfrom
alexcrichton:bincode-varint
Oct 26, 2020
Merged

Encode modules with variable-length integers#2322
alexcrichton merged 1 commit into
bytecodealliance:mainfrom
alexcrichton:bincode-varint

Conversation

@alexcrichton
Copy link
Copy Markdown
Member

Update Module::{serialize,deserialize} to use variable-length integers
with bincode to make the output artifacts smaller. Locally this
reduces the size of #2318 from 160 to 110 MB, a 30% decrease in size!
Deserialization performance is slightly slower, but seemingly within the
range of noise locally for me.

Update `Module::{serialize,deserialize}` to use variable-length integers
with `bincode` to make the output artifacts smaller. Locally this
reduces the size of bytecodealliance#2318 from 160 to 110 MB, a 30% decrease in size!
Deserialization performance is slightly slower, but seemingly within the
range of noise locally for me.
@github-actions github-actions Bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Oct 26, 2020
@github-actions
Copy link
Copy Markdown

Subscribe to Label Action

cc @peterhuene

Details This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@alexcrichton alexcrichton merged commit 2723385 into bytecodealliance:main Oct 26, 2020
@alexcrichton alexcrichton deleted the bincode-varint branch October 26, 2020 14:52
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Oct 26, 2020
This commit reduces the size of `InstructionAddressMap` from 24 bytes to
8 bytes by dropping the `code_len` field and reducing `code_offset` to
`u32` instead of `usize`. The intention is to primarily make the
in-memory version take up less space, and the hunch is that the
`code_len` is largely not necessary since most entries in this map are
always adjacent to one another. The `code_len` field is now implied by
the `code_offset` field of the next entry in the map.

This isn't as big of an improvement to serialized module size as bytecodealliance#2321
or bytecodealliance#2322, primarily because of the switch to variable-length encoding.
Despite this though it shaves about 10MB off the encoded size of the
module from bytecodealliance#2318
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Oct 27, 2020
This commit reduces the size of `InstructionAddressMap` from 24 bytes to
8 bytes by dropping the `code_len` field and reducing `code_offset` to
`u32` instead of `usize`. The intention is to primarily make the
in-memory version take up less space, and the hunch is that the
`code_len` is largely not necessary since most entries in this map are
always adjacent to one another. The `code_len` field is now implied by
the `code_offset` field of the next entry in the map.

This isn't as big of an improvement to serialized module size as bytecodealliance#2321
or bytecodealliance#2322, primarily because of the switch to variable-length encoding.
Despite this though it shaves about 10MB off the encoded size of the
module from bytecodealliance#2318
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Nov 3, 2020
This commit reduces the size of `InstructionAddressMap` from 24 bytes to
8 bytes by dropping the `code_len` field and reducing `code_offset` to
`u32` instead of `usize`. The intention is to primarily make the
in-memory version take up less space, and the hunch is that the
`code_len` is largely not necessary since most entries in this map are
always adjacent to one another. The `code_len` field is now implied by
the `code_offset` field of the next entry in the map.

This isn't as big of an improvement to serialized module size as bytecodealliance#2321
or bytecodealliance#2322, primarily because of the switch to variable-length encoding.
Despite this though it shaves about 10MB off the encoded size of the
module from bytecodealliance#2318
alexcrichton added a commit that referenced this pull request Nov 3, 2020
This commit reduces the size of `InstructionAddressMap` from 24 bytes to
8 bytes by dropping the `code_len` field and reducing `code_offset` to
`u32` instead of `usize`. The intention is to primarily make the
in-memory version take up less space, and the hunch is that the
`code_len` is largely not necessary since most entries in this map are
always adjacent to one another. The `code_len` field is now implied by
the `code_offset` field of the next entry in the map.

This isn't as big of an improvement to serialized module size as #2321
or #2322, primarily because of the switch to variable-length encoding.
Despite this though it shaves about 10MB off the encoded size of the
module from #2318
@fitzgen
Copy link
Copy Markdown
Member

fitzgen commented Nov 4, 2020

To be clear this is just enabling variable length encoding for integers, right? It is not switching to a delta encoding (which would ensure that essentially all addresses are 1-byte encoded) correct?

@alexcrichton
Copy link
Copy Markdown
Member Author

Ah yes indeed, a delta encoding might make it even more further compact.

I've so far shied away from big changes like that though. I think we may want to do it eventually, but it trades-off in-memory footprint for lookup speed since we can no longer quickly do a random lookup for a particular trapping pc. Not that we necessarily need that to be too too fast.

@fitzgen
Copy link
Copy Markdown
Member

fitzgen commented Nov 4, 2020

I'm only advocating for delta encoding in the serialized format, not the in-memory representation.

@tschneidereit
Copy link
Copy Markdown
Member

I'm only advocating for delta encoding in the serialized format, not the in-memory representation.

I think it'd be good to try not to have those two deviate too much: ideally we should at some point be able to mmap the on-disk representation entirely or almost entirely instead of having to read and deserialize it.

There clearly are size/speed tradeoffs to be had here, so that might not be the only thing we support, but I hope it can become a thing we support :)

cfallin pushed a commit that referenced this pull request Nov 30, 2020
Update `Module::{serialize,deserialize}` to use variable-length integers
with `bincode` to make the output artifacts smaller. Locally this
reduces the size of #2318 from 160 to 110 MB, a 30% decrease in size!
Deserialization performance is slightly slower, but seemingly within the
range of noise locally for me.
cfallin pushed a commit that referenced this pull request Nov 30, 2020
This commit reduces the size of `InstructionAddressMap` from 24 bytes to
8 bytes by dropping the `code_len` field and reducing `code_offset` to
`u32` instead of `usize`. The intention is to primarily make the
in-memory version take up less space, and the hunch is that the
`code_len` is largely not necessary since most entries in this map are
always adjacent to one another. The `code_len` field is now implied by
the `code_offset` field of the next entry in the map.

This isn't as big of an improvement to serialized module size as #2321
or #2322, primarily because of the switch to variable-length encoding.
Despite this though it shaves about 10MB off the encoded size of the
module from #2318
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wasmtime:api Related to the API of the `wasmtime` crate itself

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants